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Abstracts 

The present paper discusses the impact of sampling error on the 
construction of confidence intervals around effect sizes. 

Sampling error affects the location and precision of confidence 
intervals. Meta-analytic re-sampling demonstrates that confidence 
intervals can haphazardly "bounce" around the true population 
parameter. Special software with graphical output is used to make 
this discussion concrete using a small heuristic sample. 
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Sampling theory and confidence intervals for effect sizes: 
using ESCI to illustrate "bouncing" confidence intervals 
Statistical significance testing is one of the most 
frequently used statistical techniques in educational and 
psychological research. Huberty and Pike (1999) provide a through 
history of statistical significance in the social sciences. 

Finch, Cumming, and Thompson (2001) review the use of statistical 
significance testing in the Journal of Applied Psychology across 
six decades. Without doubt, null hypothesis significance testing 
has dominated the social science literature as the primary means 
of evaluating the import of findings. However, it has been well- 
documented that, given a large enough sample, a statistically 
significant result may always be found even when there is very 
little association between the independent and dependent 
variables (Cohen, 1994; Craig, Eison, & Metze, 1976). Therefore, 
some statisticians have suggested reporting effect sizes and make 
a distinction between statistical and practical significance 
(Cohen, 1994; Henson & Smith, 2000; Thompson, 1988). The 
reporting and interpretation of effect sizes is critical to good 
statistical practice. As the APA Task Force on Statistical 
Inference noted (Wilkinson & APA Task Force on Statistical 
Inference, 1999): 

It is hard to imagine a situation in which a 
dichotomous accept-re j ect decision is better than 
reporting an actual p- value or, better still, a 
confidence interval... Always provide some effect-size 
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estimate when reporting a p-value. (p. 599, emphasis 
added) 

Influenced by the Task Force report and current trends in 
the field, the recent fifth edition of the APA Publication Manual 
(APA, 2001) called the "failure to report effect sizes" a "defect 
in the design and reporting of research" (p. 5) . The Publication 
Manual later observed: "For the reader to fully understand the 

importance of your findings, it is almost always necessary to 
include some index of effect size or strength of relationship in 
your Results section" (p. 25). 

The fifth edition Publication Manual went further and also 
emphasized the role of confidence intervals (CIs) in result 
interpretation : 

The reporting of confidence intervals can be an 
extremely effective way of reporting results. Because 
confidence intervals combine information on location 
precision and can often be directly used to infer 
significance levels, they are, in general, the best 
reporting strategy. The use of confidence intervals is 
therefore strongly recommended, (p. 22, emphasis added) 
Current trends in social science research support the 
inclusion and interpretation of effect sizes. The APA Publication 
Manual (APA, 2001) supports confidence intervals as one of the 
best reporting strategies. Therefore, the use of confidence 
intervals around effect sizes has obvious benefits for 
understanding research results. A confidence around an effect 
size informs the reader of (a) the point estimate of the 
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population effect and (b) the degree of precision based on some 
level of confidence. Cumming and Finch (2001) provide an 
accessible introduction to the application of confidence 
intervals for effect sizes (i.e., Cohen's d) . 

The purpose of the present paper is to discuss the use of 
confidence intervals around effect sizes. More specifically, the 
impact of sampling error on the degree of precision in confidence 
intervals is examined. Because samples vary in the amount of 
sampling error present, the width of confidence intervals can 
vary from sample to sample even at the same level of confidence 
(e.g., 95%). Further, the possibility that confidence intervals 

around effects can be examined meta-analytically across studies 
is explored. 

Definition of Confidence Intervals 

Many textbooks give common definitions of confidence 
intervals (Felder & Thompson, 2001). For example, Hinkle, 

Wiersma, and Jurs (1998) defined a confidence interval (Cl) in 
one-sample case for the mean as "a range of values that we are 
confident (but not certain) contains the population parameter" 

(p. 220) . Moore and McCabe (1993) provided a more general 
definition: "A level C confidence interval for a parameter 0 is 

an interval computed from sample data by a method that has 
probability C of producing an interval containing the true value 
of 0" (p. 433) . In reality, whether or not a confidence captures 

a population parameter is dichotomous decision - either the 
interval captures the parameter or it does not (Cumming & Finch, 
2001; Thompson, 2001). The level of confidence for an interval, 
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say 95% or 99%, actually refers to the percentage of intervals 
that would capture the parameter across many studies. The mata- 
analytic perspective many help some researchers keep in mind that 
a given interval actually may not contain the population 
parameter even if the interval does not capture the null 
hypothesis value. 

Purpose for using Confidence Interval with Effect Sizes 

CIs are usually used in hypothesis testing. However, as 
Thompson (1998) stated, "If we mindlessly interpret a Cl with 
reference to whether the interval subsumes zero, we are doing 
little more than nil hypothesis statistical testing" (p. 800). 

CIs are most useful in comparison with intervals from 
related prior studies, instead of comparing with zero as the 
assumed value of the nil null hypotheses. In this vein, CIs can 
be most useful in meta-analysis as researchers meta-analytically 
evaluate CIs across studies. In fact, confidence intervals 
themselves can be meta-analytically synthesized which allows 
researchers to get, based on a history of research, the best 
point estimate of a population effect and a best estimate of the 
confidence intervals that should be around that effect. Just as 
the meta-analysis of effects yields greater accuracy in 
estimating true population effects, meta-analysis of confidence 
intervals increase our precision level and generally yields a 
more narrow, meta-analytic interval . 

Recently, Cumming and Finch (2001) presented new software 
that automates and illustrates the concepts of non-central 
distributions and CIs for effect sizes. This user-friendly 
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software runs under Microsoft Excel and is called Exploratory- 
Software for Confidence Intervals (ESCI, pronounced "esky") . 

Using this software, the present paper illustrates the concept of 
sampling theory in the calculating of effects (and subsequently 
CIs) across studies, including the influence of sampling error. 

Sampling Error and Sampling Distributions 

In inferential statistics, a hypothesis is tested based on 
the assumption that the researcher can logically establish a 
hypothesized value for a population parameter, or in other words, 
we hope to make inferences about the population on the basis of 
what we observe in the sample. Of course, although we want to 
make inference from the sample to the population, what we 
actually do in null hypothesis testing is to assume something to 
be true in the population (i.e., the null hypothesis) and then 
test the likelihood of the null given our sample data. Therefore, 
the inference is actually from the population to the sample; not 
what we hope to do, but what we do nonetheless (cf . Cohen, 1994) . 

The general formula for constructing confidence interval is 
CI=statistic +. (critical value) (standard error of the 
statistic) . CIs can be constructed around a great number of 
statistics (e.g., mean, coefficient alpha, kurtosis, etc.), but 
commonly researchers evaluate means. For example, if we assume 

population mean (|i) equals the sample mean ( X ) and the 
population variance (a) is known, and the sampling distribution 
of the mean is normally distributed, the formula for Cl becomes: 

Cl- X + (Z ev ) (ax) 
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Where 

X = Sample Mean 

Z cv = critical value using the normal distribution (Z cv equals 1.96 
when a = .05) 

ax=standard error of the mean (Hinkle et al . , 1998, p. 219). 
Hinkle et al. also suggested when computing confidence intervals 
for which the population variance (a 2 ) is unknown, we need to 

find critical value by using the t distribution rather than the 
normal distribution. 

The above formula makes clear that as the standard error of 
a sampling distribution increases, so does the width of the Cl. 
This is important because it speaks to the degree of sampling 
error present in our sample, with more sampling error 
theoretically yielding larger standard errors. If we correctly 
conceptualize standard error as nothing more or less than the 
standard deviation of the sampling distribution, then we can 
understand that larger samples will yield smaller standard 
errors. Furthermore, holding sample size constant, smaller sample 
standard deviations will generally yield smaller standard errors. 

All of this informs us that as sample size gets bigger and 
sample standard deviation gets smaller, the width of the 
confidence interval will become more narrow and be more precise, 
theoretically due to less sampling error and a smaller standard 
error. These same concepts will apply to CIs around effect sizes, 
although the derivation of CIs around effects is somewhat 
different . 
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Central and Non-central Test Distributions 

When we assume the null hypothesis is true in population, 
the appropriate test distribution is the central distribution 
(e.g., central t test or central F test). Most of the popular 
statistical packages such as SPSS and SAS calculate CIs based on 
normal or "central" t-test statistic distributions. In the 
central distribution scenario, the confidence interval is 
obtained by "inverting" the hypothesis test, which means, the 
test of the null hypothesis is performed by checking to see 
whether the interval (i 0 ± t (Standard error) . However, when we 

believe the "nil" null hypothesis is not true, noncentral test 
distributions are necessary to compute accurate CIs for certain 
effect statistics such as Cohen's d, R 2 , A. 2 ) (Cohen, 1994; Cumming 

& Finch, 2001; Filder & Thompson, 2001; Smithson, 2001). The 
reader is referred to Cummings and Finch (2001) and Smithson 
(2001) for reviews of CIS from central and non-central 
distributions . 

Heuristic Example 

Cumming’ s Exploratory Software for Confidence Intervals 
(ESCI) has six functions: NonCentral t, Power, Cljumping, 
Cloriginal, Cldelta, and MAthinking (Cumming, 2001). Two 
functions are relative to the present paper: Cloriginal and 
Cljumping. Cloriginal calculates and displays CIs for data with 
three simple experimental designs: Case 1 for single group, Case 
2 for a two independent groups, and Case 3 for paired data. 
Cljumping takes repeated samples from a population to illustrate 
basic concepts of CIs and sampling error. To simplify the 
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demonstration, I will use Case 1 for a single group and sampling 
error to illustrate CIs for effect size in this paper. 

One Sample Case with Cloriginal Function 

To help to understand a confidence interval, heuristic data 
was developed to illustrate CIs with Cloriginal (see Figure 1) . 

The data for one sample were: 22, 23, 26, 31, 32, 33, 33, 35, 36, 

) 

49 (n=10, M=32 , SD=7.70, CI = 32 + 5.51, H 0 . 26, t = 2.46). This 
figure illustrates the sample and the corresponding Cl. 



INSERT FIGURE 1 ABOUT HERE 

The ESCI output in Figure 1 clearly illustrates the Cl of 
this sample. The Cl is a range between 32 +, 5.5103, with 95% of 
confidence level. The interval does not capture the null value 
and is statistically significant (p = .036). 

One Case Sampling with Cljumping Function 

Another ESCI function, Cljumping, takes repeated, independent 
samples from a normal population. Here I set the population means 
as 50 and population standard deviation as 20 and let the 
software re-sample 15 times (n=16 each time) . ESCI generated 
graphical output of sample mean, SD, and standard error seen in 
Table 1 . 



INSERT TABLE 1 ABOUT HERE 



Cljumping also generated the sampling distribution of the 
mean as seen in Figure 2. Note that this distribution would 



Confidence Intervals 11 



become less variable as sampling error decreased, or as sample 
size gets bigger and sample standard deviation gets smaller. 



INSERT FIGURE 2 ABOUT HERE 



Figure 2 also illustrates the population parameters (|X = 50, 



H„= 50) . 



INSERT FIGURE 3 ABOUT HERE 



Figure 3 graphically demonstrates the means and confidence 
intervals for the 15 samples drawn as listed in Table 1. This 
Figure visually demonstrates that, because sampling error varies 
from sample to sample, the location and the width of the 
intervals also varies from sample to sample. This demonstration 
reveals the fact that our confidence intervals (e.g., 95% or 99%) 
that are used in constructing confidence intervals actually apply 
to the percentage of intervals that will capture the population 
parameter across many samples. If the demonstration had been 
continued infinitely, 95% of the intervals would have been 
captured the population mean of 50. 

Due to sampling error, even confidence intervals can 
haphazardly "bounce" around the population parameter, just as 
point estimates (i.e., the sample mean) also vary around the 
population mean. However, the synthesis of these confidence 
intervals meta-analytically would yield a much more precise 
indication of what the true population parameter might be. 
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This graphical output illustrates that confidence interval 
can be understood as distribution of different sampling cases, 
and they may or may not being captured in population means, with 
95% of alpha level (see Figure 3) . However, the bigger the sample 
size, the more confident we are in knowing the population 
parameter . 

Discussion 

Thompson (2001) asserted, "...like any other statistical 
estimates made using sample data, confidence intervals for effect 
sizes are impacted by sampling error variance" (p. 11) . ClJumping 
output from ESCI software (see figure 3) illustrates that each 
time we take samples from population, CIs are different in terms 
of location and precision (i.e., width of interval). 

This reality raises the question of whether a single Cl (for 
example, 95% CIs) in a given study can be described as being 95% 
likely to capture the population parameter (as is often the 
case) . This view is problematic because on a binominal basis the 
interval either does or does not capture the population parameter 
(Thompson, 2001, p. 15). ESCI further demonstrates that the 
confidence levels (e.g., 95% or 99%) often associated with CIs 
that do not refer to a single point estimate around the 
population mean, or other statistic, but rather to the percentage 
of CIs across studies that capture the population mean. ESCI also 
makes clear the impact of sampling error on the location and 
precision of confidence intervals, such that confidence intervals 
across studies are likely to "bounce" around the true parameter. 
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Figure 1 . Cloriginal output 
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Table 1 




ClJumping output 





No. 


Sampling mean 


SD 


SE 


Bar Display 


i 


40.07333571 


23.58907 


5 . 897267 


12.56974 


2 


46.27842016 


23.154314 


5.788578 


12.33807 


3 


56.78747654 


22.226357 


5.556589 


11.8436 


4 


55 . 54492772 


15.531329 


3.882832 


8.276066 


5 


44 . 80125297 


18.544907 


4 . 636227 


9.88189 


6* 


37 . 57639048 


22 . 934678 


5.73367 


12.22103 


7 


53 . 71488795 


19.363909 


4 . 840977 


10.3183 


8 


42 . 50622977 


23.555014 


5.888753 


12.55159 


9 


41.7082917 


19.921418 


4.980355 


10 . 61538 


10 


55.98832202 


22.024905 


5.506226 


11.73625 


11 


56.76437253 


25 . 842126 


6.460531 


13 . 77031 


12 


54 . 19252757 


19.097086 


4.774271 


10.17612 


13 


51 . 14443851 


5.797775 


3 . 949444 


8.418045 


14 


55 . 16810559 


20.866201 


5.21655 


11.11882 


15 


49 . 32266377 


20 . 875924 


5.218981 


11.124 



Note : In case 6, the bar displayed didn't capture the confidence 
interval. (Sample size n=15) 



Figure 2 . ClJumping output of sampling distribution of the mean 
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Figure 3 . ClJumping output of CIs around effect sizes (sample 
size n=15, 14 cases captured, case 6 un-captured) 
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